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[57] ABSTRACT 

Error correction circuitry attempts to detect and correct on 
the fly erroneous words within random access memory 
(RAM) within a computer system. RAM errors are scrubbed 
or corrected back in the memory without delaying the 
memory access cycle. Rather, the address of the section or 
row of RAM that contains the correctable error is latched for 
later used by an interrupt-driven firmware memory-error 
scrub routine. This routine reads and rewrites each word 
within the indicated memory section — the erroneous word is 
read, corrected on-tbe-fiy as it is read, and then rewritten 
back into memory correctly. If the size of the memory 
section exceeds a predetermined threshold, then the process 
of reading and re-writing that section is divided into smaller 
sub-processes that are distributed in time using a delayed 
interrupt mechanism. Duration of each memory scrubbing 
subprocess is kept short enough that the response time of the 
computer system is not impaired with the housekeeping task 
of scrubbing RAM memory errors. System management 
interrupts and firmware may be used to implement the 
memory-error scrub routine, which makes it independent of 
and transparent to the various operating systems that may be 
run on the computer system. 

12 Claims, 6 Drawing Sheets 
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TIME-DISTRIBUTED ECC SCRUBBING TO to correct the erroneous memory location. Unfortunately, in 

CORRECT MEMORY ERRORS some systems the limited number of interrupt request signals 

or vectors that are available are already utilized. Also, a 

FIELD OF THE INVENTION different version of the correction routine may be required 
^ ... . . ^. 5 for each different operating system that will be run on the 

The present invention relates to error correction within comouter svstem 

digital computer systems. In particular, it relates to P / • 

scrubbing, or correcting, errors in memory in a time- SUMMARY OF THE INVENTION 

distributed manner. ^ computer system includes a processor and a memory 

BACKGROUND OF THE INVENTION r^^cti^s™'""" '^''P^"'""*'"- ""^""^y P"''"°""'* 

When data is read back from a memory in which it has When a controller for the memory determines that a 

been stored, it occasionally happens that an error occurs, i.e. memory word contains a correctable error, it indicates to the 

that the data read back is not identical to the data previously processor, via an interrupt, the section of memory to which 
stored. 15 the erroneous word belong^. In response, the processor reads 

A number of error correcting codes (ECC) are known in and rewrites each word within that section of the memory, 

the prior art that are capable of not only detecting but also The interrupt mechanism used is distinct from that used for 

correcting errors. Typically, these codes can detect a broader input/output interrupts. 

range of errors than they can correct. For example, a In some embodiments, the memory conU-oller generates 
DED-SEC code is capable of detecting any double errors 20 the error correction check bits when data is written to the 

that occur within the data field the code covers (i.e. errors in memory. In some embodiments, the memory controller 

which two bits within the field are erroneous) and of corrects the memory data as it is read from the memory into 

correcting any single errors (i.e. only one wrong bit). the processor. In some embodiments, the address space, 

As applied to main memory or Random Access Memory processor state and register set used by the processor for the 

(RAM) within a computer system, it may be desirable to reading and re-writing process is distinct from that used 

consider each 64-bit double word as its own data field, i.e., during normal processor operation and distinct from that 

to store along with it its own ECC or redundancy check used for input/output interrupts, 

information. As the computer system reads words from BRIEF DESCRIPTION OF THE DRAWINGS 

memory, this ECC information would be checked so that ^ . j i_r„ 

errors in the word could be detected and hopefully corrected. , ^he present mvention is illustrated m the following 

r , ^^^t t . . . . drawings, in which known circuits are shown in block- 

If the ECC hardware detects a correcUble error, then it is ^^^^ ^^^^ ^h^ ^^^^ explanation 

desirable to correct the word bemg read on-the-fly so as to ^j,^ ^^^j^^,^ understanding. The present 

provide the processor or I/O controlkr that IS reading mam invention should not be taken as being limited to the 
memory with a corrected word. This is a performance 3^ piefe„ed embodiments and design alternatives illustrated, 

cntical task because accessing mam memory is one of the - , . r .u — »■ 

J, 1 . r . . FIG- 1 shows the components of the error correcting 

most performance-critical aspects of computer system , . ^ * • *• j *u • • * 

. . . , J J,. • 1 * scrubber of the present mvention and theur mterconnections 

design. Any unprovement or degradation in the latency * . . . 

between an access request and the deUvery of the data . ^0- 2 shows the time sequence of the scrubbing opera- 

requested often has a substantial effect on overall system ^ ^ P"**"' '"^ associated control 

performance. signals. 

T* • t J ■ uf * . *i- -J- • FIG. 3 shows how one embodiment interleaves check bits 

It is further desu-able to correct the word in main memory • . . u ^ -j 

. 1 . ¥r i_ and data bits withm a memory word and how it divides 

because errors accumulate over time. If subsequent errors r»T»i^ . xr \- c c i^ \- u 

. ^, _j *u -* RAM into N sections of four subsections each, 

occur withm the same word, then they may convert a ^ , . ^ . . 

correctable error into an un-correctable error. The process of ^ ^^^^^ ^^^^"^ °f steps that the system 

correcting the data stored in memory is called scrubbing the management firmware goes through to scrub memory errors, 

memory. Compared with the on-the-fly correction described 5 shows the relationship, accordmg to one embodi- 

above, the process of correcting the data stored in main m^nt of the present mvention, among the system manage- 

memory is more time consuming and more costly in terms ment firmware and the software, hardware and basic input/ 
of requiring additional hardware and/or software to imple- 50 output system (BIOS) firmware of an example computer 

ment it. system. 

In one approach to scrubbing memory, it is desired to not ^ s^^^s the error correction matrix used in one 

impose any of the error correction task on software. In this embodiment of the present invention. This matrix deter- 

case, it would be desirable to include in the memory mines both how the error check bits are generated from the 

controller a state machine that temporarily suspends the 55 ^^^^ ^^"^ ^ "^"^le erroneous bit can be identified 

nonmal operation of the memory and writes the corrected "^^^ syndrome bits. 

word back to the erroneous memory location. Disadvantages DETAILED DESCRIPTION OF THE 

of this approach include both the complexity of the hardware INVENTION 

that would be required to do the write back and the perfor- Architecture 

mance penalty because the memory would not be accessible Disclosed herein are various alternative embodiments and 

for other purposes until the correct and re-write process is design alternatives of the present invention which, however, 

completed. should not be taken as being limited to the embodiments and 

In another approach to scrubbing memory, it is desired to alternatives described. One skilled in the art will recognize 

keep hardware costs and complexity at a minimum and alternative embodiments and various changes in form and 
impose most of the error correction task on software. Such 65 detail that may be employed while practicing the present 

an approach would find it desirable to generate an interrupt invention without departing from its principles, spirit or 

to activate software or firmware, executing on the processor, scope. 
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The present invention is a method and apparatus for 
correcting erroneous words within a computer system 
memory. FIG. 1 is a system architecture diagram of the 
memory and processor portion of a computer system that 
uses a RAM scrubber according to one embodiment of the 
present invention. 

As words are read from memory, ECC circuitry attempts 
to detect and, if possible, proceeds to correct errors on the 
fly, i.e. before they are provided to the requester. 

If a correctable error occurs, the ECC circuitry performs 
the correction and provides the corrected data word to the 
requester. In one embodiment, a one-cycle delay in memory 
access latency accommodates the error correction process. 

The ECC circuitry scrubs, or corrects, the errors in RAM 
without delaying the memory access cycle to do so. The 
corrected word is not immediately rewritten into RAM. 
Rather, an indication of the section of RAM in which the 
error occurred is latched for later use by a firmware memory 
scrubbing routine. 

In one embodiment, the ECC circuitry does not require 
that the full word address of the erroneous word be latched. 
Rather, in order to reduce hardware cost and complexity, 
substantially fewer bits than a full word address are stored — 
the stored bits indicating only which section of memory 
contains the erroneous word. In one embodiment, each 
section corresponds to a memory row and the row address is 
latched to indicate the section to be scrubbed. 

The section address is provided to an interrupt-driven 
firmware routine that scrubs that section, i.e. that reads and 
rewrites each memory word within that section. This ensures 
that the erroneous word is read, corrected on-the-fly as it is 
read, and then rewritten. When the word is rewritten back 
into memory, it is stored correctly. This is desirable because 
errors accumulate over time and a second error within the 
same memory word is likely to make that word uncorrect- 
able. 

If the size of a memory section exceeds a predetermined 
threshold, then the process of scrubbing that section is 
divided into smaller sub-processes. These sub-processes are 
distributed in time using delayed interrupts. By keeping the 
duration of each subprocess below a threshold, the ECC 
circuitry ensures that the re^onse time of the computer 
system is not significantly impaired by the housekeeping 
task of scrubbing memory errors. 

If an un-correctable error occurs, the ECC circuitry gen- 
erates a software interrupt. Often, such an error is not 
recoverable and the process executing must be aborted or the 
system must be re-booted. 

In one embodiment, system management interrupts and 
firmware provide this memory error scrubbing in a manner 
that is independent of and transparent to the operating 
system running on the computer system. System manage- 
ment interrupts (SMIs) occupy an interrupt vector space that 
is independent of that of regular interrupts, such as input/ 
output interrupts. System management interrupt service rou- 
tines execute in a program address space that is independent 
of that of regular program execution and of that of regular 
interrupts. System management interrupt service routines 
make use of processor state information that is independent 
of that used for regular program execution and of that used 
for regular interrupts. 

In this embodiment, there are no conflicts or contention 
for interrupt vectors or program address space between the 
memory scrubbing routine and any normal program or 
interrupt activities. Further, there are no operating-system 
specific drivers required to support memory scrubbing. The 
advantages of this embodiment include enhancing the reli- 
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ability and and the platform or system independence of the 
ECC scrub operation. 

Each word in RAM memory 102 as shown in FIG. 1 
comprises both data bits and error correction code (ECC) or 

5 error check bits. In one embodiment, each word comprises 
64 data bits and 8 check bits. Typically, the memory being 
checked for errors is a random access memory (RAM), such 
as the computer system main memory, or input/output buffer 
memory. Nevertheless, the ECC circuitry and methods 
described herein is adaptable to any digital memory that can 
be written on a word-by-word basis. 

All read from and writes to memory 102 pass through 
memory and ECC controller 101. Whenever a word is 
written to RAM memory 102, memory and ECC controller 
101 generates error check bits from the data bits provided by 

^5 the device requesting the write, such as processor 103. In 
one embodiment, partial word writes are supported by 
means of a read modify write cycle, as is known in the prior 
art. 

Typically memory and ECC controller 101 also provides 

20 read and write access to RAM memory 102 to other devices 
(not shown), such as peripheral device controllers. TypicaDy 
this is done via a system bus (not shown). 

When a word within RAM memory 102 is read, memory 
and ECC controller 101 computes a syndrome based on the 

25 values of the data and check bits read. If the syndrome is 0, 
no error occurred. This is the most prevalent situation. 
Occasionally, an error occurs and the word from RAM 
memory 102 has one or more bits reversed. In the case of a 
correctable error, memory and ECC controller 101 corrects 

30 the erroneous word on the fly, that is it provides to the 
requester a corrected version of the word requested. 

The present invention makes no attempt to correct the 
contents of RAM memory 102 as it is being read. Rather, 
when memory and ECC controller 101 detects a correctable 

35 error, it activates correctable error signal 122, which signals 
system management interrupt controller and scheduler 105 
to initiate a memory scrub operation. This signaling may be 
done via a system bus, to which both memory and ECC 
controller and system management interrupt controller and 

40 scheduler 105 are coupled. 

At the appropriate time (there may be higher priority 
interrupts pending), interrupt controller and scheduler 105 
generates a system management interrupt by activating 
system management interrupt request signal 120. Other 

45 embodiments of the invention could use the computer sys- 
tem's non-maskable interrupt mechanism or its regular 
interrupt mechanism. 

In response, though not necessarily immediately, proces- 
sor 103 acknowledges the system management intermpt 

50 (SMI) request and transfers control to a memory scrubbing 
interrupt service routine that is resident in system manage- 
ment memory 104. System management memory 104 is 
typically a non-volatile memory, such as a programmable 
read only memory (PROM) or flash memory. This memory 

55 may also contain the computer system's basic input output 
system (BIOS). 

The memory scrubbing routine reads the contents of 
section address register 130, which is part of memory and 
ECC controller 101. Section address register 130 indicates 

60 which section of memory needs to be scrubbed. It may or 
may not complete the scrubbing operation at one time. If it 
does not, it activates schedule system management interrupt 
signal 121. This causes interrupt controller and scheduler 
105 to schedules another system management interrupt after 

65 a programmable delay. 

In one embodiment, memory and ECC controller 101 is 
implemented in a first integrated circuit that also couples 
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processor 101 and RAM memory 102 to a high-speed 1. Data bits 0 to 25 

system bus (not shown) that complies with the well known 2. Check bit 2 

peripheral component interconnect (PCI) specification. In ph v h*t ^ 

this embodiment, system management interrupt controller *-necK on 3 

and scheduler 105 is implemented in a second integrated 5 4- 26 to 31 

circuit that also couples this PCI bus with a industry standard 5. Check bit 3 

architecture (ISA) bus. In this embodiment, schedule system g Check bit 4 

management interrupt signal 121 is implemented by writing ' . 

specified values to specified control registers within the '* '° ^' 

second integrated circuit. 8. Check bit 6 

Operation 9. Check bit 1 

FIG. 2 is an example timing diagram showing how iq p^j^ bits 58 to 63 

scrubbing operation 203 is distributed in time. Scrubbing Check bit 7 

operation 203 is active during each of time periods 206, 10 v k' n 

which are separated by substantial time intervals. Each time ^^^^ ^ 

period 206 is initiated by a corresponding activation 205 of 15 In another embodiment, the error code of FIG. 6 is used, 
system management interrupt request signal 120. In the the check bits, if used, are stored in bits 64 to 71 of the 
particular example sequence shown in FIG. 2, as single memory word. When the computer system is initially booted 
activation of correctable error 122 (i.e., a single occurrence at power on self test (POST) time, then the system BIOS can 
of a correctable memory error) results in three activations determine or look up whether or not the system is ECC 
205 of system management request signal. 20 capable, i.e. whether bits 64 to 71 are actually present in 
The first activation 205 is generated by system manage- RAM memory 102. The BIOS enables or disables memory 
ment interrupt controller and scheduler 105 in response to ECC checking accordingly. This embodiment allows the 
activation 204 of correctable error signal 122. Correctable same system design and components to be used both for a 
error signal 122 is generated by memory and ECC controller lower-cost computer system that does not have memory 
101. 25 error detection and correction capabilities and a higher- 
Each subsequent activation 205 is generated by system reliability computer system that does, 
management interrupt controller and scheduler 105, in FIG. 4 is a flow chart showing the steps within the 
response to but after a programmable delay from each memory scrub interrupt service routine. This interrupt han- 
activation 207 of schedule system management interrupt dler starts 401 when processor 103 acknowledges an occur- 
signal 121. Schedule system management interrupt signal 30 rence of a system management interrupt. Next, processor 
121 is generated by processor 103 acting under control of 103 in step 402 determines whether the active interrupt is a 
system management firmware 104. memory scrub interrupt, in which case control passes to step 
When system management firmware 104 completes 404. Otherwise whatever other system management event 
scrubbing the section of memory that contains the correct- occurred is serviced in step 403 — a system power manage- 
able error, it does not schedule another system management 35 ment event, for example. 

interrupt. Scrubbing operation 203 is not active again until Step 404 determines whether this is the first pass, or the 

memory and ECC controller 101 detects another correctable first occurrence of a system management interrupt request 

error and activates correctable error signal 122. 205 due to a particular correctable error event 204. If not 

FIG. 3 is a memory map showing the layout of RAM control passes to step 412. If so, control passes to step 405, 

memory 102 according to one embodiment. In this 40 which reads, from section address register 130 within 

embodiment, RAM memory 102 is divided into N sections, memory and ECC controller 101, the address of the section 

numbered 1 to N. To shorten the duration of each of time within RAM memory 102 that contains the word with a 

periods 206, each section of RAM memory 102 is divided correctable error. 

into four subsections, denoted "a" through "d". Next, step 406 reads or determines the size of this section. 

For example, if a correctable error occurs within Section 45 Typically, each section is the same size, but as more memory 

2, then subsection la is scrubbed during one time period 206 is added to the computer system each section contains more 

and another system management interrupt is scheduled, then words. Step 408 tests if the size of this section is less than 

subsection 2h is scrubbed during another time period 206 or equal to a predetermined limit, 8 megabytes (MB) in the 

and another system management interrupt is scheduled, then particular case shown in FIG. 4. If so, then the entire section 

subsection 2c is scrubbed during another time period 20 6 50 is scrubbed in step 407, and the system management service 

and another system management interrupt is scheduled, then routine terminates in step 415. 

subsection Id is scrubbed during a final time period 206. If the size of the section to be scrubbed is greater than the 

The memory map of FIG. 3 also shows, according to one limit, then, in step 409, the first subsection of the memory 

embodiment of the invention, how the ECC or check bits are section containing the error is scrubbed. Next in step 410, 

interleaved among the data bits. Using this particular inter- 55 another memory scrub intermpt is scheduled to occur after 

leaving and the particular ECC code shown in FIG. 6, the a predetermined delay, and the system management service 

present invention detects any error that is confined within a routine terminates in step 415. Schedule system manage- 

single 4-bit nibble (i.e., bits 0 to 3, 4 to 7, etc.). ment interrupt signal 121 is used for this scheduling. 

RAM memory 102 may be implemented using a series of In step 412, the next memory subsection is scrubbed, 

integrated circuits (ICs) each of which holds one nibble's 60 Next, step 413 determines whether or not there is another 

worth of data for a number of words. If one such IC, which memory subsection to be scrubbed. If not, then the system 

may be a single in-lime memory module (SIMM), is missing management service routine terminates in step 415. If so, 

or defective, then all of bits of that nibble can be erroneous. then in step 414, another memory scrub interrupt is sched- 

Because this is a common failure mode, it is desirable to be uled to occur after a predetermined delay, and the system 

able to detect that. 65 management service routine terminates in step 415. 

The bit order within the code word according to this Independent System Management Firmware and Interrupt 

embodiment is as follows: Requests 
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FIG. 5 shows how system management finrnware 104 fits 
in with the hardware, software and other firmware compo- 
nents of an example computer system in a way that is 
independent of, and transparent to, operating system 511. 

System management interrupt controller and scheduler 5 
105 schedules interrupts that activate system management 
firmware 104. It also receives requests from system man- 
agement firmware 104 to schedule such interrupts to occur 
after a specified delay. 

System management firmware 104 is independent of 10 
BIOS firmware 521, though both may reside in the same 
non-volatile memory device within the computer system. 
System management interrupt request signal 120 is inde- 
pendent of the interrupt request control signals that com- 
municate between peripheral devices 531 and BIOS firm- is 
ware 521 or software device drivers 512. 

BIOS firmware 521 in a typical system supports basic 
input and output operations, such as display and keyboard 
control functions. Peripheral devices 531 and operating 
system 511 communicate by means of device interrupts 20 
handled by the BIOS and by means of the OS making calls 
to BIOS routines. 

Other input and output operations are supported by device 
drivers 512. In these cases, applications software 510 and 
peripheral devices 531 communicate with each other by 25 
means of interrupts handed by the device drivers and device 
driver calls respectively. Software device drivers 512 are 
used instead of drivers within BIOS firmware 512 in the case 
of more complex peripheral devices such as network inter- 
face cards or of more advanced operating systems such as 30 
Windows NT^m or Windows 95™. 

System management firmware 104 performs the memory 
scrub operation of the present invention without interfering 
in any way with peripheral devices 531, BIOS firmware 521, 
device drivers 512 (if used), operating system 511 or appli- 35 
cations software 510. 
An Example ECC Code and Algorithm 

The present invention can be used with a variety of ECC 
codes, one of which is illustrated in FIG. 6. This particular 
ECC code started with Rao and Fujiwara's description^ of a 40 
method for constructing a SEC-DED-S4ED rotational code 
that protects 64 data bits with 8 check bits. This code was 
augmented with the unused weight-3 column vectors to 
produce a code with length 72 that retains the SEC-DED- 
S4ED and rotational properties, and is symmetric. 45 

* T. R. N. Rao and E. Fujiwara, Error-Controi Coding for Computer Systems, 
Prentice Hall 1989, p. 287-293. 

The first 64 columns of FIG. 6, i.e. those labeled data bits, 
show the G-matrix of the ECC code used in this embodi- 
ment. Each row of the G-matrix shows how to compute, on 
writing RAM memory 102, the corresponding check bit. The 
first 72 columns of FIG. 6, i.e. those labeled data bits and 
check bits, show the H-matrix of the ECC code used in this 
embodiment. Each row of the H-matrix shows how to 
compute, on reading RAM memory 102, the corresponding 
syndrome bit. 

When writing a word into RAM memory 102, memory 
and ECC controller 101 computes the 8 check bits as 
follows: 

For the check bit N, select the N'th row in the G-matrix, 
where the rows are numbered 0 to 7. The 64 columns 
of the G-matrix correspond to the 64 bits of the word 
specified by the device that is requesting the memory 
write operation. 

Compute the 1-bit sum (i.e. the modulo-2 sum) of the data 6S 
bits that are marked with a 1 in the selected row. That 
sum is the value of check bit N. 
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Write the 8 check bits computed above into the memory 
along with the 64 data bits of the word being written. 

When reading a word firom RAM memory 102, memory 
and ECC controller 101 computes the 8 syndrome bits as 
follows: 

For the syndrome bit N, select the N*th row in the 
H-matrix, where the rows are numbered 0 to 7. The 72 
columns of the H-matrix correspond to the 64 data bits 
and the 8 check bits of the word addressed by the 
device that is requesting the memory read operation. 

Compute the 1-bit sum (i.e. the modulo-2 sum) of the data 
and check bits that are marked with a 1 in the selected 
row. That sura is the value of check bit N. 

Then, memory and ECC controller 101 uses the syndrome 
to determine if an error has occurred, and if so what type of 
error, as follows: 

If all syndrome bits are zero, then the memory word is 
correct as read. 

Else, if either nibble of the syndrome (i.e. bits sO to s3, or 
bits s4 to s7) is non-zero and the other nibble contains 
three one bits, then some nibble within the word read 
contains a three bit or a four bit error. 

Else, if the syndrome contains an even number of one bits, 
then an un-correctable error has occurred (e.g. a 
double-bit error). 

Else, if the syndrome contains an odd number of one bits, 
then a single-bit correctable error has occurred. 

In the case of a single bit error, memory and ECC 
controller 101 uses the syndrome to invert exactly one bit 
within the word as read, as follows: 

Compare the 8 syndrome bits to the 8 rows of the 
H-matrix of FIG. 6 column by column. The column that 
they match is the column corresponding to the bit 
position that was read erroneously. For example, if the 
syndrome bits are 0001 0101 (in sO to s7 order), then 
the bit 7 of the word was read erroneously. 

Invert whatever value was read for the bit position that 
corresponds with the matching column. In the same 
example, invert bit 7 of the word as read to generate the 
correct word. 

In one embodiment, the data transferred over the system 
bus is protected from errors by using the same ECC code as 
is used for RAM memory 102. In this embodiment, memory 
and ECC controller 101 performs the above described 
syndrome generation and checking (and perhaps error 
correction) on data words received from the system bus 
before they are written into memory with the same error 
check bits (or perhaps with the corrected error check bits 
corrected, based on the above techniques). 

CONCLUSION 

As illustrated herein, the present invention provides a 
novel and advantageous method and apparatus for correcting 
errors in RAM memory. One skilled in the art will realize 
that alternative embodiments, design alternatives and vari- 
ous changes in form and detail may be employed while 
practicing the invention without departing from its 
principles, spirit or scope. 

In particular the system architecture shown in FIG. 1, the 
control signals shown in FIG. 2, the memory map shown in 
FIG. 3, the steps in the memory scrub interrupt service 
routine shown in FIG, 4, the software/firmware/hardware 
relationships shown in FIG. 5 and the ECC code shown in 
FIG. 6 may be simplified, augmented or changed in various 
embodiments of the invention. 
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The following claims indicate the scope of the present 
invention. Any variation which comes within the meaning 
of, or range of equivalency of, any of these claims is within 
the scope of the present invention. 

What is claimed is: 5 

1. A computer system with memory error correction, 
comprising 

a memory to store data, the memory comprising a plu- 
rality of words, each word comprising data bits and 
error check bits, the memory being partitioned into a 10 
plurality of sections wherein each word belongs to one 
section and each section contains a plurality of words; 

a processor operable to read and rewrite each word within 
an indicated section, by reading and re-writing words 
within a first subsection of the indicated section, and 
then scheduling an interrupt to read and rewrite words 
within a second subsection of the indicated section; and 

a memory controller comprising circuitry to determine, in 
response to a word being read from the memory, if the 
word being read contains a correctable error, and if so 
to interrupt the processor, and to provide the processor 
with an indication of the section to which the word 
being read belongs, the interrupt occurring via an 
interrupt request signal distinct from that used for ^5 
input/output interrupts. 

2. The computer system according to claim 1, wherein 
said memory controller is further operable in response to a 
request to write into the memory to generate the error check 
bits based on the data bits being written. 

3. The computer system according to claim 1, wherein 
said memory controller is further operable in response to a 
request to read from the memory to correct an error within 
any word based on the word's data bits and error check bits 

as read from the memory and to provide the corrected word ^5 
in response to the request. 

4. The computer system according to claim 1, wherein the 
indicated section is below a predetermined byte size and the 
reading an re-writing of words is performed in only the first 
section. ^ 

5. The computer system according to claim 1, wherein the 
processor services the interrupt using an address space, 
processor state and register set distinct from that used during 
normal processor operation and distinct from that used for 
input/output interrupts. 

6. A method of memory error correction, comprising 
determining if a word read from a memory contains a 
correctable error, and if so: 

i) latching an indication of a section to which the erro- 
neous word belongs, each word belonging to one of a 50 
plurahty of sections and each section containing a 
plurahty of words; 

ii) interrupting a processor via an interrupt request signal 
distinct from that used for input/output interrupts; 

iii) providing the processor with the section indication; 55 
and 

iv) reading and re- writing the words within a first sub- 
section of the indicated section; scheduling an inter- 
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rupt; and reading and re-writing the words within a 
second subsection of the indicated section in response 
to receiving the interrupt. 

7. The method according to claim 6, further comprising: 
generating error check bits based on the data bits of a 

word being written into the memory; and 
storing both the error check bits and the data bits as the 
word being written. 

8. The method according to claim 7, wherein the deter- 
mining of a correctable error is based on the data bits and the 
error check bits read from said memory. 

9. The method according to claim 6, wherein the indicated 
section is below a predetermined byte size and the reading 
an re-writing of words is performed in only the first section. 

10. The method according to claim 6, wherein the pro- 
cessor services the interrupt using an address space, proces- 
sor state and register set distinct from that used during 
normal processor operation and distinct from that used for 
input/output interrupts. 

11. A computer system with memory error correction, 
comprising 

a) a memory means for storing data, the memory means 
comprising a plurality of words, each word comprising 
data bits and error check bits, the memory means being 
partitioned into a plurality of sections with each word 
belonging to one of the sections and each section 
containing a plurality of the words; 

a) a processor means for reading and writing words within 
the memory means and for re-writing each word within 
an indicated one of said memory sections, wherein the 
processor means reads and re-writes the words within 
a subsection of the indicated section by signaling the 
interrupt request means to re-interrupt the processor 
means, and by reading and re-writing, in response to 
the re-interrupt, the words within a second subsection 
of the indicated section; 

c) an interrupt request means for interrupting the 
processor, the interrupt request means being distinct 
from that used for input/output interrupts; and 

d) a memory controller means for generating the error 
check bits based on the data bits of any word being 
written into the memory, for determining based on the 
data bits and the error check bits if the word accessed 
by said read contains a correctable error in response to 
said processor means reading said memory means, and 
if so for latching an indication of the section to which 
said erroneous word belongs, for interrupting said 
processor means via said interrupt request means, and 
for providing said processor means with said section 
indication. 

12. The method according to claim 10, wherein said 
processor means for reading and re-writing each word 
within the indicated section uses an address space, processor 
state and register set distinct from that used during normal 
processor operation and distinct from that used for input/ 
output interrupts. 

4c * « 4t « 
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